Probabilistic Latent Maximal Marginal Relevance
ثبت نشده
چکیده
Diversity has been heavily motivated in the information retrieval literature as an objective criterion for result sets in search and recommender systems. Perhaps one of the most well-known and most used algorithms for result set diversication is that of Maximum Marginal Relevance (MMR). In this paper, we show that while MMR is somewhat adhoc and motivated from a purely pragmatic perspective, we can derive a more principled variant via probabilistic inference in a latent variable graphical model. This novel derivation presents a formal probabilistic latent view of MMR (PLMMR) that (a) removes the need to manually balance relevance and diversity parameters, (b) shows that specific definitions of relevance and diversity metrics appropriate to MMR emerge naturally, and (c) formally derives variants of latent semantic indexing (LSI) similarity metrics for use in PLMMR. Empirically, PLMMR outperforms MMR with standard term frequency based similarity and diversity metrics since PLMMRmaximizes latent diversity in the results.
منابع مشابه
Query-Focused Multidocument Summarization Based on Hybrid Relevance Analysis and Surface Feature Salience
Query-focused multidocument summarization is to synthesize from a set of topic-related documents a brief, well-organized, fluent summary for the purpose of answering an information need that cannot be met by just stating a name, date, quantity, etc. In this paper, the task is essentially treated as a sentence retrieval task. We propose a hybrid relevance analysis to evaluate the relevance of a ...
متن کاملSummarizing Relevant Information for Question-Answering Using Hybrid Relevance Analysis and Surface Feature Salience
Much research for question-answering aims to answer factoid, definitional and biographical questions. In most cases, the answers are given as a name, date, quantity, and so on. In this paper, we try to merge techniques of multidocument summarization and question-answering to generate a brief, well-organized fluent summary to provide more relevant information for the purpose of answering real-wo...
متن کاملExtractive summarization of meeting recordings
Several approaches to automatic speech summarization are discussed below, using the ICSI Meetings corpus. We contrast feature-based approaches using prosodic and lexical features with maximal marginal relevance and latent semantic analysis approaches to summarization. While the latter two techniques are borrowed directly from the field of text summarization, feature-based approaches using proso...
متن کاملSpike and Slab Gaussian Process Latent Variable Models
The Gaussian process latent variable model (GPLVM) is a popular approach to non-linear probabilistic dimensionality reduction. One design choice for the model is the number of latent variables. We present a spike and slab prior for the GP-LVM and propose an efficient variational inference procedure that gives a lower bound of the log marginal likelihood. The new model provides a more principled...
متن کاملTREC 2010 Blog Track: Top Stories Identification
This paper describes our participation in the TREC 2010 Blog Track. For the Top Stories Identification Task, we explore the relationship among news events, news stories and blog posts. We first extract important news events from the TRC2 corpus using a probabilistic mixture model. Then, we propose a probabilistic approach to identify top news stories. Furthermore, we use an additional feature t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010